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Background of the Invention 
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Brief Description of the Drawings 



FIG. 1 is a block diagram illustrating a portion of an apparatus for rate control for 
a constant-bit-rate finite-buffer-size video encoder in accordance with an embodiment of 
the invention. 

FIG. 2 is a block diagram illustrating a portion of an apparatus for rate control for 
a constant-bit-rate finite-buffer-size video encoder in accordance with an embodiment of 
the invention. 

FIG. 3 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

FIG. 4 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

FIG. 5 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

FIG. 6 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

FIG. 7 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

FIG. 8 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

FIG. 9 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 




Detailed Description of a Preferred Embodiment 

A method and apparatus for rate control for a constant-bit-rate finite-buffer-size 
video encoder is described. Rate control is provided by adjusting the size of non-intra 

5 frames based on the expected size of future intra frames. Here, the size of a frame is the 
number of bits in the encoded, or compressed, frame. A sliding window approach is 
implemented to avoid excessive adjustment of non-intra frames located near the end of a 
group of pictures. A measurement of "power" based on a sum of absolute values of pixel 
values is used. The "power" measurement is used to adjust a global complexity value, 

10 which is used to adjust the sizes of frames. The global complexity value responds to 
scene changes. 

An embodiment of the invention calculates and uses LI distances and pixel block 
complexities to provide rate control. An embodiment of the invention implements a 
t[? 15 number of bit predictor blocks. Predictions may be performed at a group-of-pictures 

in 

ry level, at a picture level, and at a pixel block level. An embodiment of the invention resets 

<2 a global complexity parameter when a scene change occurs. 

y ^ 

H 

q Video data is organized as a sequence of frames. A frame represents an 

^ 20 instantaneous image. Thus, the video data may be thought of as being divided in time 

iritis 

f;Q into frames. The frames may be divided in space into smaller elements of the frames. As 

J;~ an example, the frames may be divided into an array of pixels. Frames may also be 

divided into groups of pixels, referred to as macrob locks or pixel blocks. One example of 

macroblock or pixel block is a 16 x 16 array of pixels. 

25 

The present invention is capable of advantageously using compression properties 
from past frames (frames that already have been compressed) and, possibly, the current 
frame, rather than requiring compression properties of future frames. High quality of 
compressed video is provided in accordance with accurate prediction of compression 
30 properties of future frames based on the available compression properties of past frames. 
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The rate-control features in accordance with an embodiment of the invention 
generate an accurate approximation of the desired number of bits in a single pass without 
iterations. Additionally, the present invention affords robust rate control. 



5 FIG. 1 is a block diagram illustrating a portion of an apparatus for rate control for 

a constant-bit-rate finite-buffer-size video encoder in accordance with an embodiment of 
the invention. FIG. 1 includes reference frame block 101, motion estimation block 102, 
motion compensated prediction block 103, uncompressed video frame block 104, adder 
105, prediction error image block 106, preprocessing stage 107, discrete cosine transform 

10 (DCT) block 108, quantization block 109, variable length coding (VLC) block 1 10, video 
buffer verifier (VBV) 111, rate control 112, and complexity estimator 113. 

Reference frame block 101 provides reference frames 114 and 115 to motion 
estimation block 102. Uncompressed video frame block 104 provides uncompressed 

Hi 15 video frames 1 18, 1 19, and 120 to motion estimation block 102, to adder 105, and to 

til 

fU preprocessing stage 107. Preprocessing stage 107 determines a power value 121 and a 

m local activity value 122. In one embodiment, the preprocessing stage 107 updates the 

1 '** power value for each subsequent picture or frame being encoded. 

y[ 20 Motion estimation block 1 02 provides a motion estimate 1 1 6 to motion 

IS compensated prediction block 103. Motion compensated prediction block 103 provides a 

O 

q pixel block type indication 1 17. Motion compensated prediction block 103 also provides 

a motion compensated prediction frame 134 as a negative input to adder 105. Adder 105 
subtracts the motion compensated prediction frame 134 from the uncompensated video 
25 frame 1 19 and provides the result 123 to prediction error image block 106. 

Prediction error image block 106 provides a prediction error image 124 to DCT 
block 108. Prediction error image block 106 also determines when a scene change occurs 
and provides a scene change indication 125 to complexity estimator 113. Prediction 
30 image block 106 further provides LI distances 126. The LI distances represent a power 
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differences within a pixel block. 

•a c « nCT result 127 to quantization block 109. 
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intra frames. 



™i Hock 202 is operatively coupled to the prediction 
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30 202 also receives VBV fullness output 130, prxel 
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type 117 and provides a target quantizer step size for a pixel block to pixel-block-level 
rate control block 203 and to number-of-bit predictor block 205. 

Number-of-bit-predictor block 205 receives LI distances 126, pixel block 
5 complexities 133, and pixel block type 1 17, as well as picture-level rate control output 
207.. The number-of-bit predictor predicts a number of bits generated by the video 
encoder. Number-of-bit predictor block 205 provides a number-of-bit prediction output 
to adder 204. MPEG stream 129 is provided to a number-of-bit counter 210. The 
number-of-bit counter 210 provides an output 211 that is received by adder 204 as a 
10 negative input. Adder 204 subtracts output 21 1 from number-of-bit prediction output 208 
and provides the result 209 to pixel-block-level rate control block 203. Pixel-block-level 
rate control block 203 receives local activity 122. Pixel-block-level rate control block 
203 also receives LI distances 126. Pixel-block-level rate control block 203 provides 
J quantizer step size 131. 

j? 15 

in 

fU FIG. 3 is a flow diagram illustrating a method for rate control for a constant-bit- 

$P rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

H ; A sliding window approach is used with respect to the GOP being processed. The sliding 

p window approach avoids wide variations in rate control adjustments dependent upon the 

jrj? 20 location of a frame (or picture) in a GOP. 
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The method begins in step 301 and continues to step 302. In step 302, a first 
quantizer step size is calculated such that a first number of bits generated at an output of 
the constant-bit-rate finite-buffer-size video encoder is constant over a first given number 

25 of frames (e.g., GOP) starting at a current frame. In step 303, the current frame is 

incremented. In step 304, a second quantizer step size is calculated such that a second 
number of bits generated at the output of the constant-bit-rate finite-buffer-size video 
encoder is constant over a second given number of frames starting at the incremented 
current frame. Thus, a full GOP is considered for each frame processed, rather than 

30 considering only those frames remaining in a static GOP or waiting until a second static 
GOP following the first static GOP is processed. 
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FIG. 4 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 
The method begins in step 401. In step 402, a power value is calculated by calculating 
the sum of absolute values of pixel values over a first frame. Step 402 may include steps 
403, 404, and 405. In step 403, an average value of the pixel values in each of a plurality 
of pixel blocks (e.g., macroblocks) within the first frame is calculated. In step 404, for 
each of the plurality of pixel blocks, a sum of absolute differences between the pixel 
values in the respective pixel block and the average value is calculated. This step may be 
repeated for all pixel blocks in the picture (e.g., frame). In step 405, each sum of the 
absolute differences for each of the plurality of pixel blocks within the first frame are 
added to obtain a power value for the first frame. 

From step 402, the method continues to step 406. In step 406, a number of bits in 
a second frame are adjusted based on the sum of the absolute values of pixel values. The 
method ends in step 407. 

FIG. 5 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 
A power value may be used to adjust a global complexity, which may be expressed as XL 
The method begins in step 501. In step 502, a reference global complexity is calculated 
for each intra frame encoded. In step 503, a reference power value is calculated for each 
intra frame encoded. 

In step 504, a power value is calculated for subsequent frames. In step 505, a 
global complexity is calculated by multiplying the reference global complexity by the 
power value and dividing by the reference power value. In step 506, the global 
complexity is used to adjust a frame size. The method ends in step 507. 

FIG. 6 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 



10 




The method begins in step 601 . In step 602, a prediction error frame including a plurality 
of pixel-level error values is obtained. In step 603, a sum of absolute values of the pixel- 
level error values for a pixel block is calculated. 

5 In step 604, an expected number of bits for the pixel block is calculated based on 

the sum of the absolute values, which may be expressed as p m b. Step 604 may include 
steps 605 and 607 and/or step 608. In step 605, an expected number of bits for a frame in 
which the pixel block is located is calculated. Step 605 may also include step 606. In 
step 606, the expected number of bits for the pixel block are summed for all pixel blocks 
10 in the frame. In step 608, for each pixel block in the frame, a pixel block complexity 

value is multiplied by the sum of the absolute values of the pixel-level error values for the 
pixel block and dividing by a target quantizer step size for the frame. In step 607, the 
expected number of bits for the frame is used to obtain constant-bit-rate video encoding. 

o 

~g In step 609, the expected number of bits for the pixel block is used to obtain constant-bit- 

| :f jj 15 rate video encoding. The process ends in step 610. LI distances may be usefully 
I U employed in accordance with the method set forth above. 

FIG. 7 is a flow diagram illustrating a method for rate control for a constant-bit- 

8: 

Q rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 

?2 20 The method starts in step 701. In step 702, a relationship between a quantizer scale factor 
03 and a number of encoded bits of a pixel block is predicted based on a known relationship 

p in previous pixel blocks of a same type. Step 702 may also include steps 703 and 704. In 

step 703, a first relationship between the quantizer scale factor and a first number of 
encoded bits of a first type of pixel block is predicted based on a first known relationship 
25 in previous pixel blocks of the first type. In step 704, a second relationship between the 
quantizer scale factor and a second number of encoded bits of a second type of pixel 
block is predicted based on a second known relationship in previous pixel blocks of the 
second type. As an example, theses relationships may be pixel block complexities. As 
can be seen, separate pixel block complexities may be determined for intra frame pixel 
30 blocks and for non-intra frame pixel blocks. 
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From step 702, the process continues to step 705. In step 705, the quantizer scale 
factor is used to control a pixel block level rate of the video encoder. Step 705 may 
include step 706. In step 706, the quantizer scale factor is used together with LI 
distances to control the pixel block level rate of the video encoder. In step 707, the 
method ends. 

FIG. 8 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 
The method begins in step 801. In step 802, a group-of-pictures-level prediction for a 
number of bits encoded for a group of pictures is calculated. Step 802 may include step 
803. In step 803, the group-of-pictures-level prediction for the number of bits encoded 
for the group-of-pictures is calculated based on a global complexity value. 

From step 802, the method continues in step 804. In step 804, a picture-level 
prediction for a number of bits encoded for a picture is calculated. Step 804 may include 
step 805. In step 805, the picture-level prediction for the number of bits encoded for the 
picture is calculated based on a pixel block type, an LI distance, and a pixel block 
complexity. 

From step 804, the method continues to step 806. In step 806, a pixel-block-level 
prediction for a number of bits encoded for a pixel block is calculated. Step 806 may 
include step 807. In step 807, the pixel-block-level prediction for the number of bits 
encoded for the pixel block is calculated based on a local activity value. 

From step 806, the method continues to step 808. In step 808, the group-of- 
pictures-level prediction, the picture-level prediction, and the pixel-block-level prediction 
are used to adjust a quantizer scale factor to provide the rate control for the video 
encoder. The method ends in step 809. Thus, the method utilizes prediction of a number 
of bits at the GOP level, the picture (e.g., frame) level, and the pixel block (e.g., 
macroblock) level to achieve higher accuracy in prediction and more effective rate 
control. 
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FIG. 9 is a flow diagram illustrating a method for rate control for a constant-bit- 
rate finite-buffer-size video encoder in accordance with an embodiment of the invention. 
The method begins in step 901. In step 902, a scene change indication is obtained from a 
prediction error image. This may be done, for example, by looking at the ratio between 
intra and non-intra coded macroblocks. From step 902, the method continues to step 903. 
In step 903, the scene change indication is used to reset a global complexity history (e.g., 
Xipb). From step 903, the method continues to step 904. In step 904, the global 
complexity history is used to provide the rate control for the video encoder. 

FIG. 10 is a flow diagram illustrating a method for rate control for a constant bit- 
rate-finite-buffer-size video encoder in accordance with an embodiment of the invention. 
The method begins in step 1001. In step 1002, a prediction for a number of bits encoded 
for a pixel block is calculated based on an LI distance, a pixel block complexity, and a 
quantizer scale factor. In step 1003, the prediction is used for adjusting the quantizer 
scale factor (e.g., mquant) to meet a targeted picture-level number of bits. The method 
ends in step 1004. 

In accordance with an embodiment of the invention, the rate control process is 
organized hierarchically as follows: 

• GOP level: distributes bits to I, P and B pictures based on the GOP structure (IBP 
pattern) and the statistical properties of the individual picture types 

• picture level: calculates the target bit allocation and mquant for the next picture 
based on statistical properties of that particular picture 

• macroblock level: adjusts mquant to meet the target bit allocation (optional) 
In addition the rate control handles the following tasks: 

• VBV compliance (bitrate adjustment, emergency quant mode, bit stuffing) 

• psychovisual masking (spatial activity based mquant modulation) 

• estimation of various rate control parameters (adaptive rate control algorithm) 
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GOP level rate control 

The proportion of bits allocated to a picture depends on its picture type (I, P, or 
B). The allocation is based on the goal of achieving fixed mquant ratios as follows: 

5 mquant j / mquant p / mquant B = Kj/K p / K B ( 1 ) 

or, equivalently: 

m W<™t ipb 1 

= - = const (2) 

K i P b c 

Throughout this document an index of ipb can have one of the the values I, P, or 
B and indicates a picture type specific entity. In (2), c is a constant that depends on the 
Q 10 bitrate and frame statistics. 



in 



3*3 I 

i y 



The relationship between the mquant (or quantiser_scale) value used for encoding 
a frame and the resulting number of bits is complex and the only way to accurately 
calculate it is by actually encoding the frame at the given mquant value. For the purpose 
15 of rate control a highly simplified model is used instead as follows: 

1 

Ripb - X ipb (3) 
' p mquant ipb 

An inverse proportional relationship is assumed between mquant and R ipb , the 

number of bits per frame (or bitrate). In this document, all bitrates are expressed as bits 
per frame instead of bits per second, therefore the terms bits per frame and bitrate are 
20 used interchangeably. 

X ipb denotes global (coding) complexity and characterizes the encoding process 
and its dependency on the frame content. In practice, X ipb is a function of mquant but 
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rate control assumes it to be constant. X ipb is determined by parameter estimation as 
described later (cf. (27) and (31)). 

Combing (2) and (3) results in 

R»=c^ = cX^ (4) 

K i P b 

where X' is a short notation for X/K (normalized complexity). The average bitrate R 
for an entire GOP can be calculated as 

ipb 

where N ipb is the number of frames of a particular type in a GOP, and N is the total 
number of frames per GOP. For example, the typical case of IBBPBBPBBPBBPBB 
corresponds to N= 15, Ni = 1, Np = 4, Nb = 10. 

Substituting (4) into (5): 

R= TN ib X! pb 

N 

and solving for c: 

c = v=i (7) 

allows to calculate the individual R ipb values (using (4)) as a function of the complexities 
Xand the average bitrate R: 

R i P » = y N x , ( 8 ) 

£j iy ipb^ipb 
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(Note that if 




N 



is interpreted as average GOP complexity X' C0P , (8) simplifies 




GOP 



Normalized VBV fullness 
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As a prerequisite for deriving the GOP level bitrate control equation, this section 
defines the concept of an actual and a normalized VBV fullness and their relationship. 
This is based on the observation that the difference between an expected actual VBV 
fullness and the current actual VBV fullness has a component that depends on the 
10 complexities, GOP structure, bitrate differences and position in the GOP pattern, which is 
,** undesirable. Introducing the concept of a normalized VBV fullness removes these 

*D dependencies. 



15 every frame would have been allocated the average number of bits per frame R, whereas 
the actual VBV fullness is based on allocating bits according to (8). The actual VBV 
fullness for the Afth frame (note that this Mis not the I/P frame distance used in defining 
GOP patterns) in a GOP can be expressed as: 



20 Here, E R 0 is the VBV fullness the start of the GOP, R 0 is the constant bitrate of the 
VBV buffer model (i.e. the bit_rate parameter in the sequence header of the MPEG 
stream, converted from bits per second to bits per frame), M ipb is the number of I,P, and 
B frames, respectively, in the current GOP up to, but not includng the current ( M 'th) 
frame, and ipb(/r) is the picture type of the &'th frame. 

25 The normalized VBV fullness is simply: 



The normalized VBV fullness is defined as the number of bits in the VBV if 



E RM = E Rt0 + MR, - X R ipb{k) = E Rt0 + MR, - £ M ipb R ipb 



(9) 



*=0 ipb 
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E RtM =E Rfi + M(R 0 -R) (10) 

It increases or decreases linearly over time, and is constant if the average bitrate matches 
the nominal bitrate of the stream. 

Subtracting (9) from (10) allows conversion between actual and normalized buffer 
fullness: 

E~R, M =E*, M +Y,M ipb R ipb -MR (11) 
Introducing the fraction of bits per GOP spent up to, but not including, the hTth. frame, 



X M i P b R ipb 

cr., = (12) 

NR V 



and the normalized difference between the actual and normalized allocation, S. 



M 



M. h R inh -MR M 

vb v t = a M -— (13) 

NR M N V ' 

equation (11) can be rewritten as: 

E R , M =E R , M+ NRS M (11a) 

For the special case of R = R 0 (nominal bitrate), and E Ro 0 = E 0 (nominal VBV 
fullness), equations (9) and (10) become 

E Ro , M =E 0 + MR 0 -"Z M ipb R ipb =E 0 - NR 0 S M (9a) 



E Ko , M =E 0 (10a) 
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GOP level rate control equation 

GOP level rate control adjusts the average bitrate R to ensure VBV compliance, 
5 which indirectly results in constant bitrate operation. Essentially it changes R 

proportionally to the deviation of the actual from the expected VBV fullness. (Note that 
this only guarantees that there is no long term drift between VBV and encoder but does 
not prevent temporary VBV underflow or overflow; this is handled separately). The 
control equation is expressed as follows: 

10 E RqM+N( =E R9M+Ni (14) 

(i.e., the bitrate R is set such that the expected normalized VBV fullness reaches the 
nominal normalized VBV fullness after N t frames). 

The remaining step is to convert (14) into an explicit equation for R. Using (10a) 
15 and (10), (14) becomes: 

E 0 =E RM +N t {R 0 -R) (14a) 
Substituting (9a), (10a), and (11a) into (14a) results in: 
S E^ U+ NR 0 S M =E RM + NRS M+ N,(R 0 -R) (14b) 

Solving (14b) for R: 

20 R = R 0 + Er ' M ~ E ^' M (15) 

As expected, the rate is adjusted proportionally to the difference between current 
and expected VBV fullness. The term - N5 M in the denominator stems from the 



m 
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conversion from actual to normalized VBV levels, removing GOP position dependencies 
from the equation. 

Picture level bit allocation 

At the GOP level the bit allocation for pictures is determined by (4). As discussed 
below, the complexities Abused in this equation are a posteriori estimates optimized to 
provide an accurate long term estimate of the bitrate versus mquant relationship. 

Bit allocation for the current picture is improved by using a priori knowledge of 
its statistical properties provided by the motion estimator. In addition, picture level bit 
allocation is responsible for preventing VBV underflows. 

Picture level bit allocation models the relationship between the target mquant for 
the current picture, d 9 and target bit allocation for the current picture, T 9 by an equation 
similar to (3): 

T =V'\ ( 3a ) 

where X' is the a priori knowledge based normalized complexity of the current frame. 

Computation of X' is discussed below (cf. equation (32)), it is based on LI distances for 
the individual macroblocks, and local complexity estimates for intra and non-intra 
macroblocks. 

Having two different estimates for the complexity of the current frame ( X\ ph , the 
'typical' complexity derived as a long-term average based on posteriori knowledge about 
previously coded frames, and X* , the 'actual' complexity based on a priori knowledge 
about the current, not yet encoded frame) leads to a variety of possible bit allocation 
schemes for the current frame. The two corner cases are as follow: 
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• mquant preserving mode: use the mquant as determined by GOP level rate control 

(d = , cf. (2)); the resulting number of bits may not match the number predicted 

c 

by GOP level rate control; this mode keeps quality constant but may cause significant 
spikes in the allocation for frames that are more complex than anticipated at the GOP 
level 

• bitrate preserving mode: try to encode the frame with a number of bits as close as 
possible to the number of bits allocated at the GOP level by adjusting the value of 
mquant; this mode results in higher stability (no unpredicted excursions in the VBV 
level), but may result in very large mquant values at scene changes (resulting in 
noticeable blockiness) and unnecessarily low mquant values for repeated frames 
(large mquant fluctuations for 3:2 pulldown material) 



These corner cases, and all the intermediate ones, can be described using the 
notion of an effective complexity X" in (4) as follows: 

T = cX" (4a) 

Mquant preserving mode corresponds to setting X" = X r , while bitrate preserving mode 
corresponds to X" = X\ ph . 



One embodiment of the invention uses the following equation to determine the 
effective complexity X" : 



X" = 



mini 



scene change 
otherwise 



(16) 



In (16), X] is the normalized complexity of I frames, X\ pb is the normalized complexity 
of frames of the type of the current frame (these are the same complexities as used by the 
GOP level rate control), and X f is the a priori knowledge based normalized complexity 
of the current frame. 
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By default (16) uses the average of A*' and X\ ph to achieve a compromise 

between the constant quality of mquant preserving mode and the higher stability of 
bitrate preserving mode. The default mode of (16) is augmented by several 
5 experimentally determined heuristics that improve behavior at certain highly non- 
stationary events as follow: 

• repeated frames (including dropped frames and 3:2 pulldown) 

• scene changes 

10 Repeated frames coded as P or B pictures tend to have very low complexity since 

they can be very accurately predicted from their reference frame(s). With default mode 
bit allocation, too many bits are allocated to these frames, and mquant drops to a very low 
value. To avoid this behavior, (16) uses the minimum of X' and the average of X' and 
X\ ph . Whenever the (a priori) actual complexity of the current frame is lower than the 



fU 15 long term average complexity, (16) goes into mquant preserving mode, reducing the 



number of allocated bits below the one predicted at the GOP level. 

P and B frames across scene changes are mostly coded using intra macroblocks 
and their encoding behaves similarly to that of I frames. Their complexity is usually 

20 much higher than that of regular P and B frames. The default mode underestimates the 
complexity of such a frame and therefore causes allocation of too few bits at an 
undesirably high mquant. On the other hand, choosing the obvious alternative, mquant 
preserving mode, can lead to extremely high bit allocation. This happens on scene 
changes from a low complexity to a high complexity scene because mquant then is still 

25 based on complexity values from the previous scene. Instead, (16) uses X) , the I frame 
complexity. This is provides improved performance based on the following: 

1 . P and B frames across a scene change behave like an I frame (mostly intra coded 
macroblocks) 
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2. as dicussed below, X] is adjusted for every picture (not just for I frames) based on 
the LI variance of the current frame, and therefore already takes the changed 
complexity of the new scene into account 

Experiments have confirmed that explicitly using the I frame complexity X\ at 
scene changes results in an allocation that avoids huge mquant spikes and also avoids bit 
allocations that are much higher than the I frame bit allocation. Only if scene changes are 
not properly detected (which happens when they occur immediately before an I frame) B 
frames are encoded with higher than optimal mquant. 

VBV compliance 

Using the target bit allocation T given in equation (4) results in a bitstream that 
has constant average bitrate R 0 , but does not guarantee VBV compliance, i.e. occasional 
VBV underflows or overflow may occur. Therefore T is adjusted based on the 
restrictions imposed by the VBV model: 



T min is a lower boundary for the number of bits required to avoid VBV overflow: 



Here R 0 is the nominal bitrate, vbv_buffer_size the value encoded in the sequence header, 
and E R M the VBV fullness before encoding the current frame. f lim is a soft limiter 
defined by the following equation: 



T = min{r,7\ } 



(17) 




(18) 
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(19) 



2 x 



For large jc, this function asymptotically converges to x max . The final value for the target 
mquant is obtained by inserting T" in (3a): 



Macroblock level rate control 

Based on the target mquant d" , macroblock level rate control determines the 
actual mquant for each macroblock in the picture taking the following aspects into 
account: 

• psychovisual masking by local activity modulation 

• adapation of mquant to meet target bit allocation ( T" ) by using feedback 

• support of fractional mquant values by using dithering 

Psychovisual Masking 

A preprocessing stage computes the relative local activity act mb of every 
macroblock as 




(3b) 



u(mb 9 b) = — Ys u iA mb > b ) 




aCt '=~ Ya aCt 'mb 



(20) 




23 



1/1 



Here u^mb.b) is the pixel value of the ij-th pixel in block b of macroblock mb, 
u(mb,b) is the average pixel value of block b of macroblock mb, act' mb is the activity of 



macroblock mb, act 1 is the average activity of the picture, act mb is the relative activity of 
macroblock mb, and n mb is the total number of macroblocks in the picture. 

The relative activity is mapped to an activity scaling factor a act mb using the 
following non-linear relation: 



^act,mb 



<*<*mb+ m a« 



The parameter m act determines the degree of activity modulation. 
10 mquant is multiplied with this scaling factor: 

mquant mb =a actmb d" (22) 

where d" is the value from (3b). 
Macroblock level control loop 

15 

In order to reduce the mismatch between the target bit allocation T" and the 
actual number of bits generated for the current image, which is caused by the limited 
accuracy of the complexity model (3), a control loop adjusts mquant at the macroblock 
level based on the accumulated mismatch from the start of the picture up the current 
20 macroblock. This improves the rate control stability. Too strong feedback, however, can 
result in large spatial variations of mquant due to local complexity changes in the image. 
The following control equation is used: 

mquant^ = mquant mb + kmb • (S mb - S mb ) (23) 

S mb is the number of generated bits up to, but not including, macroblock number mb. S mb 
25 is the expected value of the same quantity. It is calculated as: 
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where X n is the extimated macroblock complexity of the n-th macroblock (cf. equation 
(33)). kmb determines the loop gain of the first order loop. It is related to nmb, the 
number of macroblocks the (linearized) system requires to reduce a mismatch to 1/e of its 
original value ('time constant' of the control loop) as follows: 

d"n mb 

M -r£* (25) 



Fractional mquant support 

The target mquant, d" , is a real valued number, while the actual mquant used by 
the encoder is an integer. For small mquant, rounding d" to the nearest integer can 
result in a significant mismatch in the generated number of bits. Usually , this mismatch 
is compensated by the macroblock level control loop. If the latter is deactivated 
( kmb = 0 ), the mquant values are dithered to approximate the real valued target value on 
average. A simple, one-dimensional, 1 tap error diffusion filter is used for this purpose. 

Parameter estimation 

This section describes how various parameters used in the rate control algorithm 
are estimated from the actual content of the video sequence being encoded. 

Global complexities 

X ipb , introduced in (3), is estimated from the relationship between mquant and 
generated number of bits of previously encoded pictures. At the end of each frame, the 
frame complexity Xof this frame is calculated as follows: 
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x = 



Sd" 



valid, mb 



0 



n valid, mb > ^ 



n valid,mb ~ 0 



(26) 



S is the number of bits generated for the frame, d" is the target mquant from (3b), n mb is 



the total number of macroblocks in the frame, n 



valid, mb 



is the number of macroblocks in 



the frame not encoded in 'emergency quantization mode'. Emergency quantization mode 
is entered if the number of bits in a partially encoded frame exceeds a threshold that 
indicates potential VB V buffer underflow. In this mode almost no bits are generated for 
the remaining macroblocks (only DC / (0,0) coefficients are encoded), independently of 



d" 



For P and B frames, X can vary noticeably from frame to frame. It is highly 
dependent on the efficiency of motion compensation, which in turn depends on the scene 
content. To reduce the effect of content dependency, a scene-change adaptive low-pass 

filter is applied to X to produce X ipb : 

=( 1 "«^Km-. +a scM>b X, fox ipb = P,B (27) 

k denotes sequential numbers for frames of the same type. a sc ipb depends on the picture 
type (P or B) and whether or not a scene change was detected. a sc ipb is set according to 
the following table: 



^sc,ipb 


no scene 
change 


scene 
change 


P 


0.75 


0.5 


B 


0.5 


0.25 



The same scheme could be applied to I frames as well. There are two drawbacks, 
however. First of all, the current scene detection scheme does not work for I frames (it is 
based on the intra vs. non-intra macroblock ratio). This would result in a non-adaptive a 
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with a value close to 1 .0. Secondly, I frames can be spaced considerably far apart (e.g. 
15 frames) resulting in long intervals without new estimates for X f . This is undesirable 
because X l not only affects bit allocation for I frames but indirectly also the allocation 
of P and B frames (i.e. an increased X l reduces the number bits allocated to P and B 
5 frames in anticipation of higher allocation requirements for the next I frame). Therefore 
an updated X f is provided for every frame. To this end, the global I frame complexity is 
modeled as 

X,=X 0 -P inlra (28) 

where X 0 is a constant and P intra is the total intra energy (or power) of the frame. P intra is 
10 calculated as 



15 



n mb~ l 



^intra Pintra ,mb 

(29) 



mb=0 



Pintra,mb * s th e i ntra energy of macroblock number mb as defined in (34) below. Note that 
Pintra,mb * s calculated at the same time as acV mh (cf. (20)) without significant additional 
computational overhead. 

An estimate for X n is obtained from the most recent I frame k: 



*** = ir*" ( 30 ) 

intra.k 

with X from (26) and P intra from (29). The index k denotes that these values are those of 
the k-th I frame. For all frames m between the £-th (inclusive) and £+l-th (exclusive) I 
20 frame, X f m is calculated from (28): 

X Jym = X 0 j • P intra>m (31) 
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A-priori complexity 



The normalized a-priori complexity for the current frame X' used in (3a) ff. is 
obtained from a-priori knowledge of the current frame before actually encoding it, in 
contrast to the 'a-posteriori' global complexity described in the previous section which is 
derived from values available only after actually encoding the frame. 



1 nmb ~ x ~ 



(32) 



X mb is a macroblock complexity estimate which depends on the coding type of the 
macroblock: 



X intra Pi 



intra *^intra,mb 



a. 



act,mb 



^nonintra t p Pzeromv t mb 



* mb =< 



a. 



act,mb 



nonintra y p r nonintra,mb 



&act,mb 
^nonintra Jb Pnonintra,mb 



a. 



act,mb 



intra coded macroblocks (I,P 5 B) 



zero motionvector macroblocks (P) 



non - intra coded macroblocks (P) 



non - intra coded macroblocks (B) 



(33) 



a act mb fr° m (21) in the denominator of (33) accounts for the mquant modulation in (22). 



x 

intra ' nontntra,p 



, and x nonintra b are the macroblock complexities for intra coded 



macroblocks, non-intra coded macroblocks in P frames, and non-intra coded macroblocks 
in B frames, respectively. p intra mb , p 2eromVttnb , and p nonintra>mb are the macroblock energies 
(or power) of intra coded, zero-motion vector coded, and non-intra coded macroblocks, 
respectively: 
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PintraM = X X \ u u ( mb > b ) - u (mb,b)\ 

b=0 ij=0 
3 7 

Pzeromwb = S Z \ V OJJ M> 6 )| (34) 
b=0 ij=0 
3 7 

6=0 i\>0 

Here v oi j (mb,b) 9 and v. y (/w6,6) are the pixel value, the zero motion vector 

prediction error, and the motion-compensated prediction error of the zj-th pixel in block 

b of macroblock mb, respectively. ii(mb y b) is the average pixel value of block b of 
macroblock mb, defined in (20). 

Intra / non-intra macroblock complexities 

x intra > X nanintra, p > j,™^ are a-posteriori estimates of the complexity of 
macroblocks of a particular type. They differ from the global complexities by being 
normalized wwith the macroblock energy (similar to X 0 in (30), but at the macroblock 
level). The underlying model for the number of bits generated for the current 
macroblock, s mb , is: 

s m b = * Pmb , (35) 
with jc and p chosen according to the current macroblock coding type and picture type. 

Estimates for x intra , x nonintrap , and x nonintra l) are obtained from previous 
macroblocks of the same type. 



jc = ^t-, with 

Pn 



s n = 0 " K-i + + *o) ( 36 > 

Pn =( 1 "«xK-l 



' Pmb { ^ 

\mquantl b 
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Equation (36) is evaluated independently for all 3 variants of x (intra, nonintra,p, 
nonintra,b). s n and /?„ are updated whenever a macroblock of matching type has been 
encoded (skipped macroblocks are excluded), x is recalculated before starting a new 
picture. a x determines the amount of low-pass filtering. It is preferably set to 10" 3 . s 0 
and p 0 are constants that stabilize* in case of low bitrate / low energy macroblocks. For 
x intra » s o * s preferably set to 75, and p 0 is preferably set to 50, otherwise ,s 0 is preferably 
set to 50, and p 0 is preferably set to 25. This results in asymptotic values of 1 .5 for 
x intra , and of 2.0 for x nonintra p and x nonintra h . These constants have been determined by 
experiment. Thus, other values may be substituted, if desired, to obtain other results. 

It should be understood that the implementation of other variations and 
modifications of the invention in its various aspects will be apparent to those of ordinary 
skill in the art, and that the invention is not limited by the specific embodiments 
described. For example, the specific type of stream being encoded may be varied. As 
another example, various aspects of the invention may be implemented without 
implementing other aspects. It is therefore contemplated to cover by the present 
invention, any and all modifications, variations, or equivalents that fall within the spirit 
and scope of the basic underlying principles disclosed and claimed herein. 
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